library(readr)
library(mosaic)
library(tidyverse)
library(pander)
library(car)
Math425PastGrades <- read_csv("~/Downloads/Math425PastGrades.csv")
Math425PastGrades[Math425PastGrades == "N"] <- "0"
Math425PastGrades[Math425PastGrades == "Y"] <- "1"
Math425PastGrades[Math425PastGrades == "M"] <- "1"
Math425PastGrades[Math425PastGrades == "F"] <- "2"
Math425PastGrades[is.na(Math425PastGrades)] <- 0
Math425PastGrades$Section <- NULL
Math425PastGrades$Gender <- as.numeric(as.character(Math425PastGrades$Gender))
Math425PastGrades$AttendedAlmostAlways <- as.numeric(as.character(Math425PastGrades$AttendedAlmostAlways))
Math425PastGrades$SpentTimeInOfficeHours <- as.numeric(as.character(Math425PastGrades$SpentTimeInOfficeHours))
Math425PastGrades$ClassActivitiesCompletedPerfectly <- as.numeric(as.character(Math425PastGrades$ClassActivitiesCompletedPerfectly))
Math425PastGrades$SkillsQuizzesCompletedPerfectly <- as.numeric(as.character(Math425PastGrades$SkillsQuizzesCompletedPerfectly))
Math425PastGrades <- Math425PastGrades %>%
mutate(Analysis = Analysis_CarPrices + Analysis_PredWeather + Theory_Residuals + Theory_SamplingDists,
Assignments = ifelse(Analysis >= 52, 1, 0))
lm7 <- lm(FinalExam ~ Midterm + I(Midterm^2) + SkillsQuizzesCompletedPerfectly + Assignments, data=Math425PastGrades)
Problem 1 Consider the ?Math425pastgrads data set. The goal of this question is to find a “best model” for predicting which is best, keeping the midterm score and averaging it with the ifnal score, or dropping the midterm score and only going for the final score. I need to decide if I want to rely solely on my final score or not.
I have decided to drop my midterm score.
I didn’t do as well as I would have liked on the midterm. It wasn’t a lack of knowledge, but there were other factors that went into my 24/100. As I created this prediction for my final score, I used my midterm score, and looked at the interaction it had when Skills Quizzes were completed perfectly (1 being completed 0 being incomplete) and the Assignments all completed to at least a score of 13 (13 and above being 1 12 and below being 0).
pander (predict(lm7, newdata = data.frame(Midterm = 24, SkillsQuizzesCompletedPerfectly= 1, Assignments=1), interval= "prediction"))
| fit | lwr | upr |
|---|---|---|
| 58.64 | 20 | 97.27 |
Based on my midterm score and how I interacted with the other lines, I have the possibility of getting a 20 on the midterm, or even a 97.27. But since it is a possibility I could get even lower than my midterm, I have decided to drop it.
lm7 <- lm(FinalExam ~ Midterm + I(Midterm^2) + SkillsQuizzesCompletedPerfectly + Assignments, data=Math425PastGrades)
pander(summary(lm7))
| Estimate | Std. Error | t value | Pr(>|t|) | |
|---|---|---|---|---|
| (Intercept) | 40.33 | 11.51 | 3.505 | 0.0006375 |
| Midterm | -0.3891 | 0.364 | -1.069 | 0.2872 |
| I(Midterm^2) | 0.007062 | 0.002905 | 2.431 | 0.0165 |
| SkillsQuizzesCompletedPerfectly | 10.52 | 3.347 | 3.144 | 0.002087 |
| Assignments | 13.05 | 3.689 | 3.537 | 0.0005717 |
| Observations | Residual Std. Error | \(R^2\) | Adjusted \(R^2\) |
|---|---|---|---|
| 128 | 18.43 | 0.4144 | 0.3953 |
plot(FinalExam ~ Midterm, data=Math425PastGrades,
col=as.factor(SkillsQuizzesCompletedPerfectly))
palette(c("hotpink", "purple", "skyblue", "yellowgreen"))
b<- coef(lm7)
b
## (Intercept) Midterm
## 40.334304950 -0.389061830
## I(Midterm^2) SkillsQuizzesCompletedPerfectly
## 0.007062428 10.522871619
## Assignments
## 13.048160951
drawit <- function(SkillsQuizzesCompletedPerfectly=1, Assignments=1, i=1){
curve(b[1] + b[2]*Midterm + b[3]*Midterm^2 + b[4]*SkillsQuizzesCompletedPerfectly + b[5]*Assignments, add=TRUE, xname="Midterm", col=palette()[i])
}
drawit(1,0,1)
drawit(0,1,2)
drawit(1,1,3)
drawit(0,0,4)
Part (a)
Determine which row is most useful in explaining y
Here is me editing my data
Math425PastGrades[Math425PastGrades == "N"] <- "0"
Math425PastGrades[Math425PastGrades == "Y"] <- "1"
Math425PastGrades[Math425PastGrades == "M"] <- "1"
Math425PastGrades[Math425PastGrades == "F"] <- "2"
Math425PastGrades[is.na(Math425PastGrades)] <- 0
Math425PastGrades$Section <- NULL
Math425PastGrades$Gender <- as.numeric(as.character(Math425PastGrades$Gender))
Math425PastGrades$AttendedAlmostAlways <- as.numeric(as.character(Math425PastGrades$AttendedAlmostAlways))
Math425PastGrades$SpentTimeInOfficeHours <- as.numeric(as.character(Math425PastGrades$SpentTimeInOfficeHours))
Math425PastGrades$ClassActivitiesCompletedPerfectly <- as.numeric(as.character(Math425PastGrades$ClassActivitiesCompletedPerfectly))
Math425PastGrades$SkillsQuizzesCompletedPerfectly <- as.numeric(as.character(Math425PastGrades$SkillsQuizzesCompletedPerfectly))
pairs(Math425PastGrades, panel=panel.smooth)
pairs(Math425PastGrades, panel=panel.smooth, col=as.factor(Math425PastGrades$ClassActivitiesCompletedPerfectly))
general tools for multiple linear regression (trying to find the true model)
Using Assessment quizzes- I have significant P values,but not a super significant R squared
lm1 <- lm(FinalExam ~ Midterm + AssessmentQuizzes , data=Math425PastGrades)
summary(lm1)
##
## Call:
## lm(formula = FinalExam ~ Midterm + AssessmentQuizzes, data = Math425PastGrades)
##
## Residuals:
## Min 1Q Median 3Q Max
## -84.198 -9.025 2.299 11.862 34.467
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 13.97213 6.55979 2.130 0.035134 *
## Midterm 0.34801 0.09953 3.496 0.000653 ***
## AssessmentQuizzes 0.48903 0.08309 5.885 3.42e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 18.36 on 125 degrees of freedom
## Multiple R-squared: 0.4093, Adjusted R-squared: 0.3998
## F-statistic: 43.3 on 2 and 125 DF, p-value: 5.157e-15
par(mfrow=c(1,3))
plot(lm1, which=1:2)
plot(lm1$residuals, main="Residuals vs Order", xlab="",
ylab="Residuals")
pairs(cbind(R = lm1$res, Fit = lm1$fit, Math425PastGrades), panel=panel.smooth)
I’m now trying based on the analysis. I dont have significant p values, so im rejecting this option
lm2 <- lm(FinalExam ~ Midterm + Analysis_PredWeather + Analysis_CarPrices , data=Math425PastGrades)
summary(lm2)
##
## Call:
## lm(formula = FinalExam ~ Midterm + Analysis_PredWeather + Analysis_CarPrices,
## data = Math425PastGrades)
##
## Residuals:
## Min 1Q Median 3Q Max
## -72.377 -9.108 2.719 11.586 43.183
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.17901 8.45242 0.139 0.889
## Midterm 0.51781 0.09887 5.237 6.77e-07 ***
## Analysis_PredWeather 1.02986 0.77131 1.335 0.184
## Analysis_CarPrices 0.95507 0.72139 1.324 0.188
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 19.55 on 124 degrees of freedom
## Multiple R-squared: 0.3355, Adjusted R-squared: 0.3194
## F-statistic: 20.87 on 3 and 124 DF, p-value: 5.205e-11
par(mfrow=c(1,3))
plot(lm2, which=1:2)
plot(lm2$residuals, main="Residuals vs Order", xlab="",
ylab="Residuals")
pairs(cbind(R = lm2$res, Fit = lm2$fit, Math425PastGrades), panel=panel.smooth)
Honestly, I havent done a simple quadratic yet, so im going to try that,
okay, really low R squared. not doing that.
lm3 <- lm(FinalExam ~ Midterm + I(Midterm^2) , data=Math425PastGrades)
summary(lm3)
##
## Call:
## lm(formula = FinalExam ~ Midterm + I(Midterm^2), data = Math425PastGrades)
##
## Residuals:
## Min 1Q Median 3Q Max
## -71.321 -7.646 4.332 12.656 44.286
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 42.109867 12.550536 3.355 0.00105 **
## Midterm -0.282206 0.400281 -0.705 0.48211
## I(Midterm^2) 0.007500 0.003196 2.347 0.02052 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 20.31 on 125 degrees of freedom
## Multiple R-squared: 0.2774, Adjusted R-squared: 0.2658
## F-statistic: 23.99 on 2 and 125 DF, p-value: 1.518e-09
par(mfrow=c(1,3))
plot(lm3, which=1:2)
plot(lm3$residuals, main="Residuals vs Order", xlab="",
ylab="Residuals")
pairs(cbind(R = lm3$res, Fit = lm3$fit, Math425PastGrades), panel=panel.smooth)
I’m going to try theory assignments
lm4 <- lm(FinalExam ~ Midterm + Theory_Residuals + Theory_SamplingDists , data=Math425PastGrades)
summary(lm4)
##
## Call:
## lm(formula = FinalExam ~ Midterm + Theory_Residuals + Theory_SamplingDists,
## data = Math425PastGrades)
##
## Residuals:
## Min 1Q Median 3Q Max
## -76.954 -8.457 2.270 11.887 40.386
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 17.26046 8.16393 2.114 0.036498 *
## Midterm 0.38935 0.09977 3.902 0.000155 ***
## Theory_Residuals -0.16462 0.57377 -0.287 0.774661
## Theory_SamplingDists 2.06763 0.47550 4.348 2.83e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 18.72 on 124 degrees of freedom
## Multiple R-squared: 0.3911, Adjusted R-squared: 0.3763
## F-statistic: 26.54 on 3 and 124 DF, p-value: 2.492e-13
par(mfrow=c(1,3))
plot(lm4, which=1:2)
plot(lm4$residuals, main="Residuals vs Order", xlab="",
ylab="Residuals")
pairs(cbind(R = lm4$res, Fit = lm4$fit, Math425PastGrades), panel=panel.smooth)
Office hours also has a pretty low R squared and an insignificant p value
lm5 <- lm(FinalExam ~ Midterm + SpentTimeInOfficeHours , data=Math425PastGrades)
summary(lm5)
##
## Call:
## lm(formula = FinalExam ~ Midterm + SpentTimeInOfficeHours, data = Math425PastGrades)
##
## Residuals:
## Min 1Q Median 3Q Max
## -70.522 -9.103 4.838 12.478 49.670
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 17.68978 7.60253 2.327 0.0216 *
## Midterm 0.62895 0.09867 6.374 3.23e-09 ***
## SpentTimeInOfficeHours 0.64022 3.68476 0.174 0.8623
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 20.75 on 125 degrees of freedom
## Multiple R-squared: 0.2457, Adjusted R-squared: 0.2337
## F-statistic: 20.36 on 2 and 125 DF, p-value: 2.212e-08
par(mfrow=c(1,3))
plot(lm5, which=1:2)
plot(lm5$residuals, main="Residuals vs Order", xlab="",
ylab="Residuals")
pairs(cbind(R = lm5$res, Fit = lm5$fit, Math425PastGrades), panel=panel.smooth)
Okay I’m going to try a model using assessment quizzes, theory
assignments,
lm6 <- lm(FinalExam ~ Midterm*AssessmentQuizzes , data=Math425PastGrades)
summary(lm6)
##
## Call:
## lm(formula = FinalExam ~ Midterm * AssessmentQuizzes, data = Math425PastGrades)
##
## Residuals:
## Min 1Q Median 3Q Max
## -84.195 -9.029 2.290 11.855 34.465
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.388e+01 1.361e+01 1.019 0.3100
## Midterm 3.494e-01 1.976e-01 1.768 0.0795 .
## AssessmentQuizzes 4.915e-01 3.152e-01 1.559 0.1215
## Midterm:AssessmentQuizzes -3.253e-05 4.091e-03 -0.008 0.9937
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 18.44 on 124 degrees of freedom
## Multiple R-squared: 0.4093, Adjusted R-squared: 0.395
## F-statistic: 28.64 on 3 and 124 DF, p-value: 3.88e-14
par(mfrow=c(1,3))
plot(lm6, which=1:2)
plot(lm6$residuals, main="Residuals vs Order", xlab="",
ylab="Residuals")
pairs(cbind(R = lm6$res, Fit = lm6$fit, Math425PastGrades), panel=panel.smooth)
Math425PastGrades <- Math425PastGrades %>%
mutate(Analysis = Analysis_CarPrices + Analysis_PredWeather + Theory_Residuals + Theory_SamplingDists,
Assignments = ifelse(Analysis >= 52, 1, 0))
lm7 <- lm(FinalExam ~ Midterm + I(Midterm^2) + SkillsQuizzesCompletedPerfectly + Assignments, data=Math425PastGrades)
summary(lm7)
##
## Call:
## lm(formula = FinalExam ~ Midterm + I(Midterm^2) + SkillsQuizzesCompletedPerfectly +
## Assignments, data = Math425PastGrades)
##
## Residuals:
## Min 1Q Median 3Q Max
## -67.457 -6.016 1.237 12.194 33.465
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 40.334305 11.507329 3.505 0.000638 ***
## Midterm -0.389062 0.363990 -1.069 0.287216
## I(Midterm^2) 0.007062 0.002905 2.431 0.016498 *
## SkillsQuizzesCompletedPerfectly 10.522872 3.346596 3.144 0.002087 **
## Assignments 13.048161 3.689094 3.537 0.000572 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 18.43 on 123 degrees of freedom
## Multiple R-squared: 0.4144, Adjusted R-squared: 0.3953
## F-statistic: 21.76 on 4 and 123 DF, p-value: 1.357e-13
ggplot(Math425PastGrades, aes(x=Midterm, y=FinalExam, color=interaction(SkillsQuizzesCompletedPerfectly, Assignments))) +
geom_point(pch=1) +
geom_point(aes(y=lm7$fit, cex=0.05)) +
facet_wrap(~interaction(SkillsQuizzesCompletedPerfectly, Assignments))
lm8 <- lm(FinalExam ~ Midterm * SkillsQuizzesCompletedPerfectly * SpentTimeInOfficeHours , data=Math425PastGrades)
summary(lm8)
##
## Call:
## lm(formula = FinalExam ~ Midterm * SkillsQuizzesCompletedPerfectly *
## SpentTimeInOfficeHours, data = Math425PastGrades)
##
## Residuals:
## Min 1Q Median 3Q Max
## -66.349 -8.551 3.441 12.083 39.186
##
## Coefficients:
## Estimate
## (Intercept) 16.8141
## Midterm 0.5897
## SkillsQuizzesCompletedPerfectly -2.0864
## SpentTimeInOfficeHours -22.9450
## Midterm:SkillsQuizzesCompletedPerfectly 0.1433
## Midterm:SpentTimeInOfficeHours 0.2438
## SkillsQuizzesCompletedPerfectly:SpentTimeInOfficeHours 53.1088
## Midterm:SkillsQuizzesCompletedPerfectly:SpentTimeInOfficeHours -0.6074
## Std. Error
## (Intercept) 13.9726
## Midterm 0.1886
## SkillsQuizzesCompletedPerfectly 22.0220
## SpentTimeInOfficeHours 19.2159
## Midterm:SkillsQuizzesCompletedPerfectly 0.2973
## Midterm:SpentTimeInOfficeHours 0.2594
## SkillsQuizzesCompletedPerfectly:SpentTimeInOfficeHours 28.6339
## Midterm:SkillsQuizzesCompletedPerfectly:SpentTimeInOfficeHours 0.3845
## t value Pr(>|t|)
## (Intercept) 1.203 0.23120
## Midterm 3.126 0.00222
## SkillsQuizzesCompletedPerfectly -0.095 0.92468
## SpentTimeInOfficeHours -1.194 0.23481
## Midterm:SkillsQuizzesCompletedPerfectly 0.482 0.63073
## Midterm:SpentTimeInOfficeHours 0.940 0.34912
## SkillsQuizzesCompletedPerfectly:SpentTimeInOfficeHours 1.855 0.06609
## Midterm:SkillsQuizzesCompletedPerfectly:SpentTimeInOfficeHours -1.580 0.11681
##
## (Intercept)
## Midterm **
## SkillsQuizzesCompletedPerfectly
## SpentTimeInOfficeHours
## Midterm:SkillsQuizzesCompletedPerfectly
## Midterm:SpentTimeInOfficeHours
## SkillsQuizzesCompletedPerfectly:SpentTimeInOfficeHours .
## Midterm:SkillsQuizzesCompletedPerfectly:SpentTimeInOfficeHours
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 19.62 on 120 degrees of freedom
## Multiple R-squared: 0.3526, Adjusted R-squared: 0.3149
## F-statistic: 9.339 on 7 and 120 DF, p-value: 3.347e-09
par(mfrow=c(1,3))
plot(lm8, which=1:2)
plot(lm8$residuals, main="Residuals vs Order", xlab="",
ylab="Residuals")
pairs(cbind(R = lm8$res, Fit = lm8$fit, Math425PastGrades), panel=panel.smooth)
lm9 <- lm(FinalExam ~ Midterm * SkillsQuizzesCompletedPerfectly * AttendedAlmostAlways , data=Math425PastGrades)
summary(lm9)
##
## Call:
## lm(formula = FinalExam ~ Midterm * SkillsQuizzesCompletedPerfectly *
## AttendedAlmostAlways, data = Math425PastGrades)
##
## Residuals:
## Min 1Q Median 3Q Max
## -66.377 -8.653 1.035 11.197 38.987
##
## Coefficients:
## Estimate
## (Intercept) -1.5466
## Midterm 0.7375
## SkillsQuizzesCompletedPerfectly 44.6072
## AttendedAlmostAlways 18.5601
## Midterm:SkillsQuizzesCompletedPerfectly -0.4062
## Midterm:AttendedAlmostAlways -0.1205
## SkillsQuizzesCompletedPerfectly:AttendedAlmostAlways -36.4691
## Midterm:SkillsQuizzesCompletedPerfectly:AttendedAlmostAlways 0.4040
## Std. Error t value
## (Intercept) 13.1645 -0.117
## Midterm 0.1858 3.969
## SkillsQuizzesCompletedPerfectly 19.3057 2.311
## AttendedAlmostAlways 19.3442 0.959
## Midterm:SkillsQuizzesCompletedPerfectly 0.2884 -1.408
## Midterm:AttendedAlmostAlways 0.2618 -0.460
## SkillsQuizzesCompletedPerfectly:AttendedAlmostAlways 28.2556 -1.291
## Midterm:SkillsQuizzesCompletedPerfectly:AttendedAlmostAlways 0.3931 1.028
## Pr(>|t|)
## (Intercept) 0.906672
## Midterm 0.000123 ***
## SkillsQuizzesCompletedPerfectly 0.022564 *
## AttendedAlmostAlways 0.339253
## Midterm:SkillsQuizzesCompletedPerfectly 0.161680
## Midterm:AttendedAlmostAlways 0.646304
## SkillsQuizzesCompletedPerfectly:AttendedAlmostAlways 0.199294
## Midterm:SkillsQuizzesCompletedPerfectly:AttendedAlmostAlways 0.306193
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 19.56 on 120 degrees of freedom
## Multiple R-squared: 0.3566, Adjusted R-squared: 0.319
## F-statistic: 9.499 on 7 and 120 DF, p-value: 2.39e-09
par(mfrow=c(1,3))
plot(lm9, which=1:2)
plot(lm9$residuals, main="Residuals vs Order", xlab="",
ylab="Residuals")
pairs(cbind(R = lm9$res, Fit = lm9$fit, Math425PastGrades), panel=panel.smooth)